A Novel Approach to Creating Disambiguated Multilingual Dictionaries

نویسندگان

  • Igor Boguslavsky
  • Jesús Cardeñosa
  • Carolina Gallardo
چکیده

Multilingual lexicons are needed in various applications, such as cross-lingual information retrieval, machine translation and some others. Often, these applications suffer from the ambiguity of dictionary items, especially when an intermediate natural language is involved in the process of the dictionary construction, since this language adds its ambiguity to the ambiguity of working languages. This paper aims at proposing a new method for producing multilingual dictionaries without the risk of introducing additional ambiguity. As a disambiguated intermediate language we use the so-called Universal Words. A set of more than 200,000 unambiguous Universal Words have been constructed automatically on the basis of the well-known English lexical database WordNet. This approach is being used for the construction of a five language-dictionary in the field of cultural heritage within the framework of the PATRILEX project sponsored by the Spanish Research Council.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Multilingual Topic Models for Improved Alignment in English-Hindi MT

Parallel corpora are often injected with bilingual dictionaries for improved Indian language machine translation (MT). In absence of such dictionaries, a coarse dictionary may be required. This paper demonstrates the use of a multilingual topic model for creating coarse dictionaries for English-Hindi MT. We compare our approaches with: (a) a baseline with no additional dictionary injection, and...

متن کامل

Extracting Multilingual Dictionaries for the Teaching

This paper describes a method for creating multilingual dictionaries using Wikipedia as a resource. A lucky strike on the road to multilingual information retrieval, the main idea is simple: taking the titles of Wikipedia pages in English and then finding the titles of the corresponding articles in other languages produces a multilingual dictionary in all those languages. While the page content...

متن کامل

The PAPILLON Project: Cooperatively Building A Multilingual Lexical Data-Base To Derive Open Source Dictionaries And Lexicons

The PAPILLON project aims at creating a cooperative, free, permanent, web-oriented and personalizable environment for the development and the consultation of a multilingual lexical database. The initial motivation is the lack of dictionaries, both for humans and machines, between French and many Asian languages. In particular, although there are large F-J paper usage dictionaries, they are usab...

متن کامل

Automatically Creating Multilingual Lexical Resources

The thesis proposes creating bilingual dictionaries and Wordnets for languages without many lexical resources using resources of resource-rich languages. Our work will have the advantage of creating lexical resources, reducing time and cost and at the same time improving the quality of resources created.

متن کامل

Bilingual emb e ddings with random walks over multilingual wordnets

Bilingual word embeddings represent words of two languages in the same space, and allow to transfer knowledge from one language to the other without machine translation. The main approach is to train monolingual embeddings first and then map them using bilingual dictionaries. In this work, we present a novel method to learn bilingual embeddings based on multilingual knowledge bases (KB) such as...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008